Dont zero initialize in export_llama by JacobSzwejbka · Pull Request #16886 · pytorch/executorch

JacobSzwejbka · 2026-01-26T22:35:18Z

Summary: Zero initialization is non standard with pytorch models, and in particular with ET is frustrating because ET looks to greedily deduplicate weights. That means if you zero initialize a transformer model the pte size will be a lot smaller then you would expect if you didnt know about the deduplication.

Differential Revision: D91518961

pytorch-bot · 2026-01-26T22:35:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16886

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a3b33f1 with merge base 5690d26 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-01-26T22:35:25Z

@JacobSzwejbka has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91518961.

github-actions · 2026-01-26T22:36:57Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: Zero initialization is non standard with pytorch models, and in particular with ET is frustrating because ET looks to greedily deduplicate weights. That means if you zero initialize a transformer model the pte size will be a lot smaller then you would expect if you didnt know about the deduplication. Differential Revision: D91518961

JacobSzwejbka requested a review from lucylq as a code owner January 26, 2026 22:35

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 26, 2026

meta-codesync bot added fb-exported meta-exported labels Jan 26, 2026

JacobSzwejbka force-pushed the export-D91518961 branch from 907d12e to a3b33f1 Compare January 27, 2026 17:44

lucylq approved these changes Jan 27, 2026

View reviewed changes

larryliu0820 approved these changes Jan 27, 2026

View reviewed changes

JacobSzwejbka merged commit c4de50d into pytorch:main Jan 27, 2026
145 of 147 checks passed

JacobSzwejbka deleted the export-D91518961 branch January 27, 2026 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dont zero initialize in export_llama#16886

Dont zero initialize in export_llama#16886
JacobSzwejbka merged 1 commit intopytorch:mainfrom
JacobSzwejbka:export-D91518961

JacobSzwejbka commented Jan 26, 2026

Uh oh!

pytorch-bot bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

JacobSzwejbka commented Jan 26, 2026

Uh oh!

pytorch-bot bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16886

✅ No Failures

Uh oh!

meta-codesync bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

pytorch-bot bot commented Jan 26, 2026 •

edited

Loading

This PR needs a `release notes:` label